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Abstract 

The Maximum Depth classifier was the first attempt to use data depths instead of multi¬ 
variate raw data in classihcation problems. Recently, the DD-classifier has hxed some serious 
limitations of this classiher but some issues still remain. This paper is devoted to extending 
the DD-classifier in the following ways: first, to be able to handle more than two groups; 
second, to apply regular classification methods (such as /cNN, linear or quadratic classihers, 
recursive partitioning,...) to DD-plots, which, in particular, allows to obtain useful insights 
through the diagnostics of these methods; and third, to integrate various sources of infor¬ 
mation (data depths, multivariate functional data,...) in the classification procedure in an 
unified way. An enhanced revision of several functional data depths is also proposed. A 
simulation study and applications to some real datasets are also provided. 


Keywords: DD-Classifier, Functional Depths, Functional Data Analysis 

1 Introduction 

In this paper we explore the possibilities of the depths in classihcation problems in multidimen¬ 
sional or functional spaces. Depths are, relatively simple, tools intended to order the points in a 
space depending on how deep they are with respect to a probability distribution, P. 
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In the one-dimensional case, it is easy to order points with respect to P, with the median being 
the innermost point and the extreme percentiles the outermost ones. Moreover, if Fp denotes the 
distribution function of P, then 


is an index which measures how deep x G M is with respect to P. This index can also be applied 
to samples replacing Fp by the empirical distribution function. Other possibilities for dehning 


Dp{x) are available (see, for instance. Subsection 2.1.1), including those in which Dp{x) decreases 
with the distance between x and the mean of P, which, in turn, is the deepest point. Most of 
them are positive and bounded, and the bigger the index, the deeper the point. 

In the multidimensional case there exists no natural order; thus, ordering the points from the inner 
to the outer part of a distribution or sample is not so easy. To overcome this difficulty, several 
depths have been proposed using different approaches. A nice review of multivariate depths is |Liu 


et al (1999). 


To the best of our knowledge, the hrst paper in which depths were used for classihcation was 
Liu ( |1990 ), where the MD-classi£er (MD-classi£er) was proposed: given two probability measures 
or classes, or groups) P and Q, and a depth, H, we classify the point x as produced by P if 


Dp{x) > Dq{x). This procedure was fully developed in Ghosh and Chaudhuri (2005). 

The MD-classiher looks quite reasonable, but it has some drawbacks which are better understood 


with the help of the DD-plots. Those were introduced in Liu et al (1999) for graphical comparison 


of two multivariate distributions or samples (see also Li and Liu (2004)). Given two probability 


distributions, P and Q on MF, a DD-plot is a two-dimensional graph (regardless of p) in which, 
for every x E the pair {Dp{x), Dq{x)) E is represented. Examples of DD-plots appear in 
Figuresand 1^ Thus, the MD-classi£er gives to Q (resp. to P) the points whose representation in 
the DD-plot is above (below) the main diagonal. Figure contains two DD-plots corresponding 
to samples from bidimensional normal distributions, where P, in both cases, is standard. The 
mean of Q in the first DD-plot is (2, 2)* and its covariance is the identity. In the other case Q is 
centered but its covariance is twice the identity. In both graphs, points in black come from P and 
points in gray from Q. We have employed the Halfspace Depth (HS) (see Liu et al ( 1999[ )). All 
sample sizes are 500. In both graphs the main diagonal is also drawn. 

The MD-classiher is optimal in the hrst case, but it is plainly wrong in the second one since it 
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DD-plot#1 


DD-plot #2 




Depth w.r.t. Sample P 


Depth w.r.t. Sample P 


Figure 1: DD-plots of two samples drawn from two-dimensional normal distributions. In both 
cases P is a standard 2-dimensional distribution. Q differs from P in the mean in the hrst DD-plot 
and in the covariance matrix in the second. 


classihes almost all points as produced by Q. The idea developed in Li et al (2012) is that the 
DD-plot contains information enabling a good classiher to be obtained. For instance, in the second 
DD-plot in Figure the proportion of gray points is very high in an area close to the vertical axis. 


Then, Li et al (2012) proposed replacing the main diagonal by a function whose graph splits the 
DD-plot into two zones with the lowest misclassihcation rate (in that paper only the polynomial 
case is fully developed). This is termed the DD-classifier. 

The DD-classi£er is a big improvement over the MD-classiher and, in the problem cited above. 


according to Li et al (2012), the DD-classiher gives a classification very close to the optimal one. 
However, an important limitation of the DD-classi£er is that it is unable to deal efficiently with 


more than two groups. The solution of this problem for g groups in Li et al (2012) was to apply 
a majority voting scheme increasing the computational complexity with the need of solving ( 2 ) 
two-groups problems. 

Moreover, there are some two groups cases in which a function can not split the points in the DD- 
plot correctly. Let us consider the situation presented in Figure The points in the scatterplot 
come from two samples, with 2,000 points each. The gray points were taken from a uniform 
distribution, Q, on the unit ball centered on (—1,0)*. The black points are from distribution P 
which is uniform on the union of two rings: a ring centered at (—1,0)* with inner (resp. outer) 
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(a) Scatterplot 


(b) DD-plot 



Depth w.r.t. Sample P 


Figure 2: Scatterplot of two uniform samples and associated DD-plot. 


radius of 0.5 (resp. 0.6), and a ring of the same size centered at (0.3,0)*. The optimal classiher 
assigns points in both rings to P and the rest to Q. The associated DD-plot is also shown in 
Figure It is obvious that no function can split the DD-plot in two zones giving the optimal 
classiher since this would require a function separating the points in areas with black points from 
the rest of the DD-plot, which is impossible. This problem, in this particular case, could be hxed 
by interchanging the axis. But, it is possible to imagine a situation in which this rotation is not 
enough. 


There are also several depths valid in functional spaces (we present some of them in Section 2.1). 


Those depths can also be applied in classihcation problems making use of the DD-classiher, but 
suffering from the same problems we mentioned in the multidimensional case. Moreover, another 
limitation of the DD-plot is its incapability to take into account information coming from different 
sources. This fault is more important in the functional setting where some transformations of the 
original curves (such as derivatives) could be used for classihcation purposes simultaneously with 
the original trajectories. 

In this paper we present the DD'^-classiher as a way to hx all the mentioned shortcomings of the 
DD-classiher in the functional setting, although the procedure can also be applied to multivariate 
data or to the cartesian product of functional spaces with multivariate ones. In fact, the DD*^- 
classiher allows to handle more than two groups and also allows to incorporate information coming 
from diherent sources. The price we pay for this is an increment in the dimension which goes from 
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2 in the DD-plot to the number of groups times the number of different sources of information to 
be handled. The DD'^-classiher can also handle simultaneously more than one depth (increasing 
again the dimension). The letter G in the name of the procedure makes reference to this incre¬ 
mented dimension. Finally, it allows to use regular classihers (like kNN, SVM,...). Since it is not 
longer compulsory to use functions to separate groups, then, for instance, it is possible to identify 
“islands” inside a DD'^-plot, avoiding the need to use rotations. 

Concerning the combination of information, it is worth to mention that, on one hand, in Section 


2.1, we include some extensions of well known depths that allow to construct new depths taking 


into account pieces of information from several sources; and, on the other hand, that some of the 
diagnostic tools of the classihcation procedures employed inside of the DD'^-classiher can be used 
to assess the relevance of the available information. In order to avoid a too long paper, we only 
show this idea in the second example in Section]^ where we conclude that the relevant information 
is contained in the second derivative of the curves. 

The paper is organized as follows: in Section we present the basic ideas behind the proposed 
classiher. Section 2.1| is devoted to present some functional depths and to analyze some modihca- 
tions which could improve them. In Section we show two examples of several classihers applied 
to DD-plots. Section contains the results of some simulations as well as applications to some 
real datasets. The paper ends with a discussion of the proposed method. 


2 DD^-Classifier 


In Li et al (2012), the DD-pIot is dehned, in the case in which only two groups are involved, as 


a two-dimensional graph where the pairs D 2 {x)) are plotted. Here, Di{x) is the depth of 

the point x respect to the data in the i-th group. With this notation, the DD-plot is, to put it 
simply, a map between the (functional) space X where the data are dehned, and 

X —)■ {Di{x), D2 {x)) 

The DD-classiher tries to identify the two groups using the information provided by the DD- 
plot. Since we have transformed our data to be in the task of separating classes is made in 
a much simpler framework, assuming that the depths contain relevant information about how to 


separate the groups. Thus, the choice of a depth has now become a crucial step. In Li et al 
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(2012) the classification rule was a polynomial function (up to a selected order k), ensuring that 
the point (0,0)* belongs to it. This rule has three main drawbacks. First, the number of different 
polynomials of order k that can serve as a classihcation rule is , where N is the sample size. 
This is the number of possible ways to select k points from iV, and each of the selections has 
an associated order k polynomial which interpolates between these k points and (0, 0)h Clearly, 
as N increases, the complexity of the estimation process grows at the rate N^. Second, the 


problem of classifying more than two groups was solved in Li et al (2012) using majority voting 


that needs to repeat the procedure for every combination of the groups. This means that the 
optimization must be solved ( 2 ) times, where g is the number of groups. Also, to avoid that the 
classihcation rule depends on the pre-specihed order of the groups, the optimization procedure 
must be repeated interchanging the axes of the DD-plot. So, the number of polynomial models 
that must be computed to create the classihcation rule is 2 ( 2 ) (^) that can be extremely large. 
Finally, polynomials always give borders between groups which do not allow the construction of 
zones assigned to one group included in a zone assigned to the other, like the horizontal black 
band between the gray zones in the DD-plot in Figure]^ 

The DD'^-classiher which we propose here tries to oher a unihed solution to these drawbacks. 
Suppose that we have a process in the product space X = Xi x ■ ■ ■ x Xp, multivariate (functional) 
data, where we have g groups (classes or distributions) to be separated using data depths. Let us 
begin by assuming that p = 1. The DD'^-classiher begins by selecting a depth D and computing 
the following map: 

X —)■ d = (Di(x),..., Dj,(a;)) e 

We can now apply any available classiher that works in a ^f-dimensional space to separate the g 
groups. The same idea is applied in Lange et al (2014). The main differences between Lange et ahs 


and our proposal are that in the former only hnite-dimensional data are considered, and this map 
is a preliminary step to constructing what is called the feature space. Then, the authors only use a 
special kind of linear classiher on this feature space which requires making pairwise comparisons. 


thus classifying points using a majority vote scheme. Mosler and Mozharovskyi (2015) apply this 


classiher to functional data, but only after performing a dimension-reduction technique to the 
data. 

The extension of the procedure to the case p > 1 is simple: we only need to select an appropriate 
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depth for each subspace Xj and consider the map 


X = Xi X ... X Xp —)■ 

X = {xi,... ,Xp) d = {D^{xi),--- ,D^{xp)), 

where D'^{xi) is the ^f-dimensional vector giving the depths of the point Xi G Xi with respect to 
the groups 1,..., and G = g x p. 

Our last consideration is related to the selection of the depth. As we stated before, the chosen 


depth may influence the result. The solution in Li et al (2012) was to select the right depth 
by cross-validation. In principle, an obvious solution could be to include all the depths at the 
same time and, from the diagnostics of the classihcation method, select which depths are useful. 
But, this approach produces an increase of the dimension of vector d up to G = YTi=i where 
> 1 is the number of depths used in the Ath component. Clearly, the advantage of this 
approach depends on how the classihcation method can handle the information provided by the 
depths. Instead of that, we propose to select the useful depths trying to maintain the dimension 


G low. This choice can be done using the distance correlation 7^, see Szekely et al (2007), which 


characterizes independence between vectors of arbitrary hnite dimensions. Recently, in Szekely and 


Rizzo (2013), a bias-corrected version was proposed. Here, our recommendation is to compute 


the bias-corrected distance correlation between the multivariate vector of depths (d) and the 
indicator of the classes (Y = (ija-gCi}, l{a:eC 2 }) • • • > l{xeCg}))) and select the depth that maximizes 
the distance correlation among the available ones. In subsequent steps, other depths can be added 
having a low distance correlation between the new depth and those selected in previous steps. 


Also, using the recent extension of the distance correlation to functional spaces provided by Lyons 


(2013), this tool could be useful for assessing how much of the relation between the functional 
data and the indicator of the groups can be collected. Indeed, the computation of this measure is 


quite easy because it only depends on the distances among data (see Dehnition 4 in Szekely et al 
(2007)). Later, in Section]^ we provide an example of the application of these ideas. 


2.1 Data Depths for Functional Data 

As mentioned earlier, the DD-classi£er is especially interesting in the functional context because 
it enables the dimension of the classification problem to be decreased from infinite to G. In this 
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section, several functional data depths that will be used later with the DD'^-classiher will be 
reviewed. Some extensions to cover multivariate functional data are also provided. 


2.1.1 Praiman and Muniz Depth (FM) 


The FM depth (Fraiman and Muniz (2001)) was the hrst one to be proposed in a functional 
context. It is also known as integrated depth by its dehnition. Given a sample Xi,...,X]\f of 
functions dehned on the interval [0, T], let St = ..., a;Ar(t)} be the values of those functions 

on a given t G [0,T]. Denote by Fjv,*, the empirical distribution of the sample St and by Di(t) 
an univariate depth of Xi{t) in this sample (in the original paper, Di{t) = 1 — |l/2 — FN,t (a^i(^))|)- 
Then the FM depth for the i-th datum is: 


FMt 


Di{t)dt. 


( 2 ) 


An obvious generalization of the FM depth is to consider different univariate depths to be inte¬ 
grated, like, for instance, the Half Space depth (HS, which is dehned in ([^), the Simplicial depth 
(SD) or the Mahalanobis depth (MliD): 


= 2FN,t{x^it)){l-FM,t{xiit)-)), 

= [l + {xi{t) - fl{t)f/a^^it)] \ 

where fi{t), are estimates of the mean and variance at point t. 

The choice of a particular univariate depth modihes the behavior of the FM depth. For instance, 
the deepest curve may vary depending on this selection. 

An interesting scenario arises when we are faced with multivariate functional data; i.e., when the 
elements belong to a product space of functional spaces: X = x ■ ■ ■ x X^. A depth combining 
the information of all components seems an appealing idea because it will maintain the dimension 
of our classihcation problem low, but it does so at the risk of losing some information. This can 
be done in the following two ways: 


• Weighted depth: given xt = ,..., xf) G X, compute the depth of every component, 

obtaining the values FM{xl),j = 1,... ,p, and then dehne a weighted version of the FM- 
depth (FM*^) as: 

p 

FMr = J2^,FM{xi), 

j=i 
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where w = {wi ,..., Wp) is a suitable vector of weights. In the choice of w, the differences in 
the scales of the depths must be taken into account (for instance, the FM depth using SD 
as the univariate depth takes values in [0,1], whereas the Half Space depth always belongs 
to the interval [0,1/2]). 

• Common support: suppose that all have the same support [0,T] (this happens, for 
instance, when using the curves and their derivatives). In this case, we can dehne a p- 
summarized version of FM-depth (FM^) depth as: 

FM>= f Lf-m, 

Jo 

where D^{t) is a p-variate depth of the vector with respect to St. 


2.1.2 h—Mode Depth (hM) 


The hM depth was proposed in Cuevas et al (2007) as a functional generalization of the likelihood 
depth to measure how surrounded one curve is with respect to the others. The population hM 
depth of a datum Xq is given by: 


fh{xo) = E[iF(m {xo,X) /h )], 

where X is a random element describing the population, m is a suitable metric or semi-metric, 
X(-) is a kernel and h is the bandwidth parameter. Given a random sample xi,... ,xn of X, the 
empirical h-mode depth is dehned as: 

N 

fh{xo) = N~^'^K{m{xo,Xi) /h). (3) 

i=l 

Equation ([^ is similar to the usual nonparametric kernel density estimator, with a main difference: 
as our interest is focused on what happens in a neighbourhood of each point, the bandwidth is 
not intended to converge to zero when X —>■ oo, and the only constraint is that the bandwidth 
should be large enough to avoid pathological situations. For instance, the bandwidth should not 
be so small that every point in the sample has the same depth equal to X(0)/X. Our default 
choice for h is the quantile 15% of the distances among different points in the sample using as K 
the standard gaussian density. 
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A weighted depth of the components can be applied to use this depth with multivariate functional 
data. Another possibility in this case is to construct a new metric combining those dehned in the 
components of the product space using a p-dimensional metric like, for example, the Euclidean; 
i.e., take 







(4) 


where rrii denotes the metric in the ^-component of the product space. It is important here 
to ensure that the different metrics of the spaces have similar scales to avoid that one single 
component dominates the overall distance. 


2.1.3 Random Projection Methods 

There are several depths based on random projections using basically the same scheme. Given 
a sample Xi,... ,xp^ of functions in a Hilbert space with scalar product (•, •), a unit vector a in 
this space is randomly selected (independently of the Xj’s) and the data are projected onto the 
one-dimensional subspace generated by a. The sample depth of a datum x is the univariate depth 
of the projection (a,x) with respect to the projected sample {{a, Although theoretically a 


single projection is enough (see Cuesta-Albertos et al (2007)), random projection methods usually 


generate several directions, Oi,... ,aR, R > 1 and summarizes them in different ways. Here, we 
will use: 


Random Projection (RP): Proposed in Cuevas et al (2007), it uses univariate HS depth and 


summarizes the depths of the projections through the mean (using i? = 50 as a default 
choice). So, if DaXx) is the depth associated with the r-th projection, then 

R 


RP{x) = 


r=l 


The extensions to multivariate functional data are similar to those proposed for the FM depth, 
excepting for the fact that here, to use a p-variate depth with the projections, it is not required 


all components to have a common support. The RPD depth proposed in Cuevas et al (2007) 


is an example of this extension using the original curves and their derivatives as components of 
multivariate functional data, which in this case are two-dimensional. 
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2.1.4 Other Depth Measures 


Some other functional depth measures have been proposed in the last years although they are 
closely related with the three ones mentioned above. For instance, the Modified Band Depth 


(MBD) proposed in Lopez-Pintado and Romo (2009) can be seen as a particular case of the FM 


depth using as univariate depth, the simplicial one. The works by leva and Paganoni (2013) 


and Claeskens et al (2014) are in the same spirit as the extension of FM depth to multivariate 
functional data with common support. The hrst paper provides a generalization of the MBD that 
uses the Simplicial Depth as p-variate depth, and the second uses the multidimensional Half Space 
depth. 


The two proposals in Sguera et al (2014) are the extension to functional data of the multivariate 


spatial depth (see, e.g. Serfling (2004)). The two depths, called Functional Spatial Depth (FSD) 
and Kernelized Functional Spatial Depth (KFSD), have different meanings. The hrst one is a 
global depth whereas the KFSD has a clear local pattern. We have tried them and we have 
obtained that FSD give results very similar to FM or RP, while KFSD behaves as the hM depth. 
Because of this, we have included none of them in the simulations and real case studies. 


2.2 Classification Methods 

The last step in the DD'^-classiher procedure is to select a suitable classihcation rule. Fortunately, 
we now have a purely multivariate classihcation problem in dimension G and many procedures 
are known to handle it successfully based either in discriminant or in regression ideas (see, for 
example, Ripley (1996)). 

Attending to their simplicity and/or easiness to draw inferences, we have selected the following 
multivariate classihcation procedures to be used here: 

1. Based on Discriminant Analysis: The Linear Discriminant Analysis (LDA) is the most 
classical discriminant procedure. Introduced by Fisher, it is a particular application of the 
Bayes’ Rule Classiher under the assumption that all the groups in the population have a 
normal distribution with diherent means, but the same covariance matrix. The Quadratic 
Discriminant Analysis (QDA) is an extension relaxing the assumption of the equality among 
covariance matrices. 


Ripley (1996 
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2. Based on Logistic Regression Models: Here, the classifiers employ the logistic transformation 
to compnte the posterior probability of belonging to a certain gronp nsing the information 
of the covariates. The Generalized Linear Models (GLM) combine linearly the information 


of vector d, whereas the Generalized Additive Models (GAM) (see Wood (2004)) relax the 
linearity assnmption in GLMs allowing the nse of a snm of general smooth fnnctions of each 
variate. 


3. Nonparametric classification methods are based on non-parametric estimates of the densities 
of the gronps. The most simple (and classical) one is the so-called fc-Nearest Neighbonr 
(fcNN) in which, given A; G N, the point d is assigned to the majority class of the k nearest 
data points in the training sample. Another possibility is to estimate the probability of be¬ 
longing to each gronp throngh the Nadaraya-Watson estimator using a common bandwidth 
for all data. This method will be denoted by NP. A kNN method could be considered an 
NP method using the uniform kernel and a locally selected bandwidth. These two methods 
are quite flexible and powerful but, unlike the previous ones, it is not easy to diagnose which 
part of the vector d is important for the final result. 


There are many other classifiers that could be employed here, for instance: classification trees, 
artificial neural networks (ANN), support vector machines (SVM) or multivariate adaptive regres¬ 
sion splines, ... but the application of any of these methods usually involves the choice of several 
auxiliary parameters or designs that must be tailored for every particular application. Also, as 
in the case of nonparametric classification methods, the trade-off between interpretability and 
predictability of these methods is biased to the latter. 

The choice among the different classifiers could be influenced by their theoretical properties and/or 
how easy it is to draw inferences. For example, from the theoretical point of view, the fcNN classifier 
can achieve optimal rates close to Bayes’ risk (a complete review on this classifier can be found in 


Hall et al (2008)) and it could be considered as the standard rule. But better inferences can be 


drawn from other classifiers such as LDA, GLM or GAM models. 
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3 Illustration of Regular Classification Methods in DD— 
Plots 


3.1 Multivariate Example 

This section is devoted to explore the different classihers that can be applied to DD-plots as an 


alternative to the proposal in Li et al (2012). In that paper, given fco = 1,2,..the classiher is the 


polynomial /, with degree at most ko such that /(O) = 0, that gives the lowest misclassihcation 
error in the training sample. We denote this classiher by DD/cq- The candidate polynomials are 
constructed by selecting points of the sample and taking the polynomial going through these 
points and the origin. In our implementation, we have ignored the step of selecting the order 
ko by cross-validation providing the best result for ko = 1,2,3 using, in each case, M initial 
combinations (M = 10,000 by default) and optimizing the best m ones (m = 1 by default) 


following the implementation of Li et al (2012). Notice that the MD-classiher can be considered 


as a particular case of DDl, hxing the slope with a value of 1. 


DD-plot(HS,DD1) 


DD-plot(HS,DD2) 


DD-plot(HS,DD3) 





Figure 3: From left to right DD-plot using DDl, DD2 and DD3 classihers to the DD-plot in 
Figure l^b). The depth in all cases is the HS. 

The application to the example in Figure [^b) is plotted in Figure]^ which shows the results for 
DDl, DD2 and DD3 classihers. The titles of the subplots are in the general form DD-plot (depth, 
classif) where depth is the depth employed (HS denotes the multidimensional Half Space depth) 
and classif denotes the classihcation method. The sample points are colored gray or black to 
indicate the group they belong to. The background image is colored light gray and dark gray to 
indicate the areas where a new data point would be assigned to gray and black groups respectively. 
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The misclassification error rates are, respectively, (0.262, 0.215, 0.201). There is a clear superiority 
of DD3 over the other classihers but there are some areas (see for example, the rectangle [0.0,0.2] x 
[0.0,0.!]) where a polynomial cannot satisfactorily classify the data. 


DD-plot(HS,lda) 


DD-plot(HS,qda) 


DD-plot(HS,knn) 
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DD-plot(HS,glm) 


DD-plot(HS,gam) 


DD-plot(HS,np) 



Figure 4: From left to right, top to bottom DD-plot using LDA, QDA, kNN, GLM, GAM and 
NP classihers to the DD-plot in Figure [^b) The depth in all cases is the HS. 

Figure l^shows the result to apply LDA, QDA, kNN, GLM, GAM and NP to the same data. The 
misclassihcation rates are, respectively, (0.472, 0.51, 0.136, 0.472, 0.152, 0.152). LDA, QDA and 
GLM methods do not achieve the result obtained by DD3 which is outperformed by fcNN, GAM 
and NP. Notice that the optimal classiher gives a theoretical misclassihcation rate of 0.138, very 
close to the result obtained with kNN. The key of this improvement over the DD-classiher is the 
hexibility of fcNN and GAM that can model complicated situations like this one. 


3.2 Functional Example: Tecator 


In this section, we use the Tecator dataset to illustrate our procedure. Later, in Section [4T] this 
dataset will be revisited to compare the performance of the DD'^-classiher from the prediction 
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Figure 5: Spectrometric curves of the Tecator dataset and their hrst two derivatives. 


point of view with other proposals. 

Those data, which here are treated as a multivariate functional, were drawn for a spectrometric 
study where the goal was to predict the fat content of meat slices using absorbance curves pro¬ 
vided by the device Tecator Infrared Food Analyzer. Many papers have treated those data from 
the regression or the classihcation point of view (e.g., [Ferraty and Vieu (2009), Febrero-Bande 


and Gonzalez-Manteiga (2013) and references therein) with the conclusion that the relevant in¬ 


formation for those goals is located in the second derivative. Here, let us suppose that we are 
interested in identifying those samples with percentage of fat above 15% (ifat=l{Fat>o.i5}) using 
the absorbance curves {ab) and their second derivatives {ab2) with the DD'^-classifier. First, con¬ 
cerning the depth, we use FM, RP and hM, where we have employed the univariate Mahalanobis 
depth to compute the hrst two and the usual L 2 -distance between functions in hM with the default 
choice of h equal to the quantile 0.15 of the set {d{xi,Xj),i % j}. Then, for each depth, at least 
hve possibilities, identihed through the different suffixes, can be explored: 


.0\ The depth uses only original trajectories, d = {Dq{x), D^{x)) . 

.2: The depth only uses the second derivatives, d = {Dl{x) , Df^x)) . 

.w: Use a weighted sum of the depth of the original trajectories and the depth of the second 
derivatives, d = (Hq (x), with Df = 0.5/1° -|- 0.5Df. 
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Figure 6: Example of pairs of hM depths used by GLM classifier with the spectrometric curves of 
the Tecator dataset and its second derivative. 


.m: Use all combinations depth/group, d = (^ 0 ( 0 ;), DHx),D'^(x), Df {x)). 


.p-. The depths of the trajectories and their derivatives (a two dimensional functional dataset) are 
combined within the depth procedure. With FM and RP depths we use the two-dimensional 
Mahalanobis depth. The hM method uses an Euclidean metric as in Q, d = (DfW,£>5(i)). 


As mentioned above, the distance correlation proposed in Szekely et al (2007) can help to detect 
the depth that best summarizes the variate ifat. The distance correlation between the group vari¬ 
ate (ifat) and the different depths are shown in Table Since this metric only uses the distance 
among data, it can also be computed with respect to the functional covariates: 77(ifat, a6)=0.14, 
77(ifat, a62)=0.77, supporting the idea that the important information for classification is con¬ 
tained in the second derivative. In Table [T] we also see that the depths based on the second 
derivative explain at least the same amount of information as the functional covariate does. In 
particular, FM.2, RP.2, hM.2, hM.w, hM.m have values over 0.7. The first derivative {abl) was 
not considered here because its distance correlation with ifat (77(ifat, a61)=0.63) is lower than the 
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FM.O 

FM.2 

FM.w 

FM.m 

FM.p 

77(ifat, d) 

0.058 

0.771 

0.393 

0.365 

0.058 


RP.O 

RP.2 

RP.w 

RP.m 

RP.p 

77.(ifat, d) 

0.065 

0.774 

0.407 

0.396 

0.065 


hM.O 

hM.2 

hM.w 

hM.m 

hM.p 

77.(ifat, d) 

0.114 

0.789 

0.706 

0.762 

0.114 


Table 1: Distance correlation between ifat and the different options for depths for the Tecator 
dataset. 

second one bnt both are qnite related among them (7^(a61, a62)=0.86). So, if we must select just 
one depth, the hM.2 must be the chosen one. In a second step, if we want to add more information, 
it is preferable to include the original trajectories because its lower distance correlation with ab2 
(7^(a6,a62)=0.23). 

The next step is to select a classiher that takes advantage of the dependence found by the distance 
correlation measure. The fcNN could seem to be a good choice because it is quite simple to imple¬ 
ment. But from the diagnosis point of view, a classiher like the GLM may be preferable. Using 
the hM.m depth (second best choice), we have four variates: ab.mode.O, ab.mode.l, ab2.mode.O, 
ab2.mode.l where the notation var.depth.group stands for the depth computed for variate var 
with respect to the points in the group group. 

The result using a GLM classiher is shown in Figure with the combinations of the four variates, 
showing clearly that those associated with the second derivative separate the two groups more 
efficiently. More interesting is that the contribution of each component can be assessed through 
the diagnosis of the GLM. The classical diagnosis of the estimates of a GLM model is shown in 
Table where the variates associated with the depths of the second derivative are both clearly 
signihcant while this is not true for the original curves. 
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Estimate Std. Error 

2; value 

P(> 1^1) 

(Intercept) 

3.538 

2.161 

1.637 

0.102 

ab.mode.O 

-0.473 

0.166 

-2.841 

0.004 

ab.mode.l 

0.054 

0.155 

0.347 

0.729 

ab2.mode.O 

-0.471 

0.103 

-4.585 

0 

ab2.mode.l 

1.09 

0.301 

3.624 

0 


Table 2: Output for the GLM classifier in the Tecator dataset. 


4 A Simulation Study and the Analysis of Some Real 
Datasets 


Four models (inspired by those in Cuevas et al (2007)) were simulated in order to check the 


performance of the proposed classiher. In all cases, the curves are obtained from the process 
X.jit) = rrijit) + e.j(t), where rrij is the mean function of group j = 1,2 and e.j is a Gaussian 
process with zero mean and Cov{e.j{s)^ e.j{t)) = 6'jexp(— |s — t\ /0.3). In all the models, 9i = 0.5 
and 62 = 0.25, giving the second group half the error of the first. The mean functions include an 


additional parameter k which is hxed at fc = 1.1. Note that Cuevas et al (2007) takes k = 1.2 


which makes the classihcation task easier due to a bigger separation of the groups. The functions 
were generated in the interval [0,1] using an equispaced grid of 51 points. These models were 
chosen trying to preserve a high similarity between groups jointly in the original trajectories and 
in their derivatives. 


• Model 1 : The population Pi has mean irti = 30(1 —The mean for P 2 is m 2 = 30(1 — 

• Model 2: The population Pi is the same as in Model 1 but P 2 is composed of two subgroups 
as a function of a binomial variate I with P (/ = 1) = 0.5. Here, m 2 j=o = 25(1 — t)^t and 
m2,7=1 = 35(1 — 

• Model 3: Both populations are composed of two subgroups, with means mi, 7=0 = 22{l — t)t^ 
and mi,7=1 = 30(1 — in the hrst population and m2,7=o = 26(1 — t)H and m 2 j=i = 
34(1 — in the second one. 


18 











Model 1, k=1.1 


Model 2, k=1.1 


Model 3, k=1.1 





Figure 7: A sample of 20 functions for every simulation model along with the means of each 
sub-group (mi’s (black lines) and m2’s (gray lines)). 

• Model 4 ■ This uses the same subgroups dehned in Model 3 but considers each subgroup as 
a group itself. So, this is an example with four groups. 

Thus, Models 1 and 4 are unimodal, while Models 2 and 3 contain at least one multimodal group. 
In last two models, the hM depth (which is local) should do better than the other ones. 

The simulation results are based on 200 independent runs. In every run, N = 200 training 
observations for Models 1 and 2 (100 for each group), and a test sample of 50 observations from 
each group were generated. For Models 3 and 4, = 400 training observations are generated 

(100 for each subgroup). Tablestoshow the misclassihcation rates for the test samples. Some 
curves obtained with each model are presented in Figure 

For the comparison, the FM, RP and hM depths (computed with the default choices explained 
in Section]^ were employed using the original trajectories and/or the derivatives of every curve, 
which were computed using splines. The different depth options are denoted as in Section [^except 
that the hrst derivative (.i) is used instead of the second one. 

The distance TZ is computed to select the best option from among the different depths (hrst row 
of Tables 1^ to 1^ . The overall winner is hM.w (closely followed by hM.p and hM.m) suggesting 
that the combined information of the curves and the hrst derivatives is better than using only 
one of them. This is a quite difficult example for a classihcation task as can be deduced from the 
relative small distance correlations obtained. As a reference, we have computed the Functional 
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fcNN (F/cNN) for all examples. 

The list of classifiers includes DDl, DD2 and DD3 as classical classifiers and also LDA, QDA, 
/cNN, NP, GLM and GAM. Note that the procedures DDi, z = 1,2, 3 can not be used with the .m 
option. 



FM.O FM.l FM.w 

FM.p 

FM.m 

RP.ORP.lRP.w 

RP.p RP.m 

hM.O hM.l hM.w 

hM.p 

hM.m 

R(Y,d) 

0.23 

0.28 

0.34 

0.32 

0.32 

0.34 

0.25 

0.36 

0.36 

0.38 

0.38 

0.42 

0.50 

0.49 

0.47 

DDl 

27.2 

21.5 

20.3 

21.2 


24.5 

16.7 

17.6 

17.8 


20.3 

15.9 

15.6 

15.7 


DD2 

24.9 

20.4 

18.6 

19.4 


24.0 

15.4 

17.4 

17.6 


18.7 

13.7 

12.6 

12.8 


DD3 

25.1 

20.4 

18.7 

19.5 


24.4 

15.8 

17.1 

17.3 


19.0 

14.0 

13.1 

13.3 


LDA 

24.2 

20.3 

18.4 

19.4 

17.9 

24.6 

15.6 

17.5 

17.5 

15.5 

18.4 

13.1 

12.2 

12.3 

11.7 

QDA 

24.5 

20.3 

18.4 

19.5 

18.1 

25.1 

15.8 

17.5 

17.6 

16.4 

18.5 

13.2 

12.1 

12.3 

11.9 

kNN 

28.2 

23.0 

20.8 

21.9 

20.7 

27.4 

18.0 

19.3 

19.3 

17.7 

20.4 

15.1 

13.6 

13.9 

13.3 

NP 

28.9 

24.0 

21.6 

22.3 

18.7 

29.0 

18.7 

20.2 

20.2 

16.2 

21.5 

15.8 

14.6 

14.7 

12.5 

GLM 

24.1 

19.9 

18.3 

19.2 

17.8 

24.3 

15.6 

17.2 

17.2 

15.4 

18.4 

13.1 

12.1 

12.3 

11.6 

GAM 

24.2 

20.0 

18.0 

19.0 

17.8 

23.8 

15.2 

16.7 

16.9 

15.2 

18.2 

13.1 

12.1 

12.2 

11.7 


Table 3: Distance correlation and misclassihcation rates for Model 1. Mean of 200 runs. 

The complete results for Model 1 are summarized in Table where the results of the distance 
correlation are, broadly speaking, conhrmed: the best results are obtained with hM.m, closely 
followed by hM.w and hM.p. In these columns, the linear classihers (LDA, GLM) seem to work 
slightly better than the others. This means that the simplest linear models are able to perform 
the classihcation task successfully. The F/cNN was computed in its three versions: .0, .1 and .p 
where the latter uses the euclidean distance combining the hrst two. The results obtained were, 
respectively, 23.04%, 18.93%, 19.02%. 



FM.O FM.l FM.w 

FM.p 

FM.m 

RP.ORP.lRP.w 

RP.p RP.m 

hM.O hM.l hM.w 

hM.p 

hM.m 

R(Y,d) 

0.24 

0.16 

0.22 

0.32 

0.26 

0.28 

0.13 

0.24 

0.32 

0.24 

0.48 

0.34 

0.50 

0.44 

0.48 

DDl 

32.5 

26.6 

25.5 

16.1 


31.1 

22.3 

22.0 

21.6 


14.0 

15.8 

10.6 

10.2 


DD2 

22.8 

27.1 

21.0 

16.0 


20.8 

21.1 

16.6 

16.3 


11.5 

16.0 

10.6 

9.9 


DD3 

23.0 

27.4 

21.3 

16.1 


20.8 

20.9 

16.6 

16.3 


11.8 

16.2 

10.8 

10.2 


LDA 

22.0 

26.4 

20.3 

16.1 

17.9 

21.1 

20.3 

16.4 

17.0 

15.5 

12.3 

15.1 

10.0 

9.7 

10.1 

QDA 

22.3 

26.7 

20.6 

15.9 

18.7 

20.8 

20.5 

16.1 

16.5 

16.0 

11.9 

14.9 

9.8 

9.3 

9.8 

kNN 

25.8 

30.7 

24.0 

18.2 

21.1 

22.0 

23.6 

17.9 

18.1 

17.5 

12.7 

17.3 

11.5 

10.8 

11.5 

NP 

26.7 

31.5 

25.0 

18.9 

18.8 

22.9 

24.4 

18.9 

19.1 

16.5 

13.3 

18.1 

12.3 

11.6 

10.7 

GLM 

22.1 

26.3 

20.2 

15.6 

17.9 

19.7 

20.1 

15.4 

15.9 

15.0 

11.7 

15.1 

9.7 

9.3 

9.7 

GAM 

22.4 

26.4 

20.5 

15.5 

18.1 

19.5 

20.1 

15.2 

15.7 

15.2 

11.0 

15.3 

9.9 

9.3 

9.6 


Table 4: Distance correlation and misclassihcation rates for Model 2. Mean of 200 runs. 
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Model 2 (Table 1^ is a difficult scenario for methods based on RP and FM depths as can be deduced 
from the low values of the distance correlation. These methods work well when the groups are 
homogeneous rather than being constituted of subgroups as in this case. The least misclassihcation 
error is obtained by the combinations liM.p-QDA, liM.p-GLM, liM.p-GAM (9.3%), although 
many classihers based on hM.w, hM.p or hM.m have misclassihcation rates under 10%. The 
results for FfcNN were 13.63%, 13.31%, 12.57%. 



FM.O FM.l FM.w 

FM.p 

FM.m 

RP.ORP.lRP.w 

RP.p RP.m 

hM.O hM.l hM.w 

hM.p 

hM.m 

R(Y,d) 

0.08 

0.24 

0.18 

0.22 

0.16 

0.16 

0.27 

0.23 

0.30 

0.32 

0.32 

0.38 

0.41 

0.38 

0.40 

DDl 

30.9 

27.9 

29.9 

28.4 


31.2 

27.9 

29.7 

29.5 


27.4 

23.3 

24.8 

25.5 


DD2 

29.5 

24.5 

26.1 

22.0 


29.8 

24.9 

27.2 

27.0 


19.4 

17.9 

16.3 

16.9 


DD3 

25.5 

24.3 

21.8 

21.4 


28.3 

23.2 

23.4 

23.3 


19.4 

18.1 

16.5 

17.0 


LDA 

32.0 

25.2 

29.8 

26.4 

25.1 

32.0 

27.0 

30.3 

30.6 

27.0 

24.0 

18.4 

19.1 

20.2 

18.2 

QDA 

28.4 

23.9 

24.9 

22.3 

21.5 

30.3 

24.7 

26.6 

26.6 

23.4 

21.9 

17.7 

17.7 

18.6 

16.9 

kNN 

25.9 

25.0 

22.2 

21.3 

21.6 

27.3 

23.6 

22.9 

22.9 

22.4 

20.3 

18.2 

16.6 

17.2 

16.7 

NP 

25.5 

24.3 

22.1 

21.3 

21.3 

27.7 

23.3 

22.5 

22.8 

21.7 

19.9 

17.9 

16.4 

17.0 

16.4 

GLM 

32.0 

25.1 

29.8 

26.4 

25.2 

32.3 

27.2 

30.4 

30.6 

27.3 

23.8 

18.3 

18.7 

20.0 

18.0 

GAM 

24.8 

23.8 

21.2 

20.6 

21.6 

26.1 

22.9 

22.1 

21.8 

21.8 

19.4 

17.6 

16.2 

16.8 

16.2 


Table 5: Distance correlation and misclassihcation rates for Model 3. Mean of 200 runs. 


Model 3 (Table is even harder for RP and FM methods. In both cases, the use of the hrst 
derivative is better than the use of the original curves or a weighted version of them. For these 
depths, the best misclassihcation errors are obtained using the combined information (FM.p-GAM 
(20.6%) and RP.m-NP (21.7%)). This is also true for the liM method but it consistently yields 
lower misclassihcation errors. The best combinations are hM.w-GAM, hM.m-GAM (16.2%) that 
are better than the results using FfcNN: 23%, 21.4%, 21.21%. 



FM.O FM.l FM.w 

FM.p 

FM.m 

RP.ORP.lRP.w 

RP.p RP.m 

hM.O hM.l hM.w 

hM.p 

hM.m 

R(Y,d) 

0.60 

0.47 

0.65 

0.65 

0.63 

0.60 

0.56 

0.67 

0.69 

0.66 

0.64 

0.58 

0.69 

0.68 

0.68 

DDl 

21.6 

29.1 

17.8 

18.1 


23.9 

19.5 

16.9 

16.8 


19.5 

17.9 

14.6 

14.4 


DD2 

21.7 

28.9 

17.2 

17.7 


23.9 

18.9 

16.7 

16.4 


17.9 

16.6 

12.7 

12.5 


DD3 

23.0 

29.6 

18.9 

19.2 


25.2 

20.3 

18.3 

18.1 


19.4 

18.0 

14.6 

14.4 


LDA 

21.0 

27.4 

16.4 

16.9 

16.8 

23.2 

18.3 

15.9 

16.0 

14.3 

17.6 

15.9 

12.5 

12.1 

12.0 

QDA 

21.0 

28.0 

17.0 

18.4 

17.4 

23.8 

18.9 

16.6 

16.5 

16.3 

17.9 

16.2 

11.8 

12.2 

12.7 

kNN 

21.5 

30.2 

17.1 

17.6 

17.4 

23.0 

19.2 

16.1 

16.0 

15.3 

17.3 

16.5 

12.0 

12.2 

12.4 

NP 

20.7 

28.2 

16.6 

17.1 

17.0 

22.3 

18.6 

15.8 

15.6 

15.3 

17.0 

16.1 

11.9 

12.0 

12.4 

GLM 

20.9 

27.7 

15.9 

16.4 

16.1 

23.0 

18.0 

15.5 

15.4 

14.0 

16.6 

15.8 

11.4 

11.3 

11.3 

GAM 

20.6 

27.5 

16.0 

16.5 

16.9 

21.8 

17.9 

15.1 

15.1 

14.5 

15.9 

15.8 

11.3 

11.5 

12.1 


Table 6: Distance correlation and misclassihcation rates for Model 4. Mean of 200 runs. 
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The results for Model 4 (Table are better than those for Model 3, supporting the idea that 
homogeneous groups are easier to classify with RP and FM depths. In all cases, the weighted 
version improves the classihcation of each component alone. This hints that the two components 
have complementary pieces of the information needed for classihcation. The best combinations for 
each depth are: FM.w-GLM (15.9%), RP.m-GLM (14%) and liM.w-GAM, hM.p-GLM, liM.m- 
GLM (11.3%). The FfcNN gives quite disappointing results: 19.8%, 21.45%, 20.16%, probably 
due to the difficulty of the scenario. 


4.1 Application to Real Datasets 


We have applied our proposal to several popular datasets in the functional data analysis literature. 
A nice review on functional classihcation can be seen in Baillo et al ( 2010| . In that follows, we 
will briehy describe the datasets, the results found in the literature and our best results using 
DD'^-classiher. 


Tecator: When the Tecator dataset is used for classihcation, several diherences in the scheme 
employed can be found in the literature; including the cutoh for groups, the size of the 


training and testing samples and even the number of runs. In Febrero-Bande and Oviedo 


de la Fuente (2012), the scheme cutoh=15% /train=165/test=50/runs=500 is employed 


with a best result of a FKGAM model of 2.1% of misclassihcation error. Here, using depths, 
the best result is 1.3% with the hM.2-DD2 model. The classical FfcNN using the second 
derivative obtains 1.9%. 


In Galeano et al (2015) a misclassihcation error of 1% is reported using a centroid method 


with the functional Mahalanobis semidistance and with the scheme cutoh=20%/train=162 
/test=53/runs=500. Following the same scheme but with 200 runs, the hM.2-DD2 (error 
rate: 1.3%) performs quite well and slightly better than the classiher using fcNN (2.5%). In 
fact, all the classihers using the second derivative show misclassihcation rates in the interval 
[1.3%, 3.3%]; that can be compared with the FfcNN classiher that obtains 1.92%. 

Berkeley Growth Study: This dataset contains the heights of 39 boys and 54 girls from age 


1 to 18. It constitutes a classical example included in Ramsay and Silverman (2005) and in 
the fda R-package. 
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As a classification problem, this dataset was treated in Bafllo and Cuevas (2008), where using 


a F/cNN procedure, a best cross-validation missclassihcation rate of 3.23% was obtained. In 
our application, the best result is obtained by the combinations hM.O-LDA, hM.O-QDA 
with 2.2%. 

Phoneme: The phoneme dataset is also quite popular in the FDA community although its 


origins are in the area of Statistical Learning (see Hastie et al (1995)). The dataset has 2000 


log-periodograms of 32ms duration corresponding to hve different phonemes {sh, del, iy, aa, 
ao). 


It appeared as a functional classihcation problem in Ferraty and Vieu (2003). Randomly 


splitting the data into training and test samples with 250 cases, 50 per class, in each sample, 
and repeating the procedure 200 times, the best result achieved by the authors was an 8 
misclassihcation rate. With our proposals, the combination hM.m-LDA misclassihes 7. 


This dataset was also used in Delaigle and Hall (2012) but it was restricted to the use of the 


hrst 50 discretization points and to the binary case using the two most difficult phonemes, 
{aa, ao), obtaining a misclassihcation rate of 20% when N = 100. Our best result is 18.6% 
obtained by hM.w-QDA although most hM procedures yield errors below 20%. 

MCO Data: These curves correspond to mitochondrial calcium overload (MCO), measured 
every 10 seconds for an hour in isolated mouse cardiac cells. The data (two groups: control 


and treatment) were used as functional data in Cuevas et al (2004) for ANOVA testing and 


the dataset is available in the fda.use package. 


As an FDA classihcation problem, it was considered in Bafllo and Cuevas (2008) where using 


a cross validation procedure, a best error rate of 11.23% was obtained. Our best results are 
the combinations hM.l-DDl, hM.m-LDA, hM.m-QDA, hM.m-NP with an error rate of 

2 . 


Cell Cycle: This dataset contains temporal gene expression measured every 7 minutes (18 
observations per curve) of 90 genes involved in the yeast cell cycle. The data were originally 
obtained by Spellman et al (1998) and used in|Leng and Muller (2006) and Rincon Hidalgo 


and Ruiz Medina (2012) with the goal of classifying these genes into two groups. The hrst 


group has 44 elements related with G1 phase regulation. The remaining 46 genes make up 
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the second group and are related to the S, S/G2, G2/M and M/Gl phases. The dataset has 
several missing observations which were imputed in this work using a B-spline basis of 21 
elements. 

Both papers cited above obtain a misclassihcation rate of 10% (9 misclassihed genes) but 
with different number of errors for each group. Our proposal achieves a 6.7% rate with the 
combinations hM.l-DDl, hM.w-kNN, hM.w-NP, liM.p-DDl, hM.m-kNN, liM.m-NP but 
almost all procedures based on hM.l or hM.w yield a misclassihcation rate of 8.9% at most. 


Kalivas: This example comes from Kalivas (1997). It was used for classihcation in Delaigle 


and Hall (2012). It contains near-infrared spectra of 100 wheat samples from llOOnm to 


2500nm in 2nm intervals. Two groups are constructed using the protein content of each 
sample, using a binary threshold of 15% that places 41 data in the hrst group and 59 in the 
second. 

Our best result for 200 random samples of size 50 was the combination FM.m-QDA with 
a 3.7% misclassihcation error. This rate is quite far from the best in [Delaigle and HalT 


(2012) {CENTPCI = 0.22%) using the centroid classiher but the latter requires projecting 
in a specihc direction that in this case corresponds to small variations on the subinterval 
[1100,1500]. Notice that any depth procedure based on the whole interval cannot achieve 
a better result than a technique focused in the small interval that contains the relevant 
information for the discrimination process. 


5 Conclusions 


In this paper we present a procedure that extends the DD-classiher proposed in Li et al (2012) 
and adapts it to the functional context in several ways: 


• Due to the hexibility of the new classihers considered, the proposal can deal with several 
depths or with more than two groups in the same integrated framework. In fact, the DD'^ 
classiher converts the data into a multivariate dataset whose columns are constructed using 
depths and the new classihers are classical multivariate classihers based on discrimination 
(LDA, QDA) or regression procedures (fcNN, NP, GLM, GAM). More classihers could be 
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considered here (such as SVM or ANN) without changing so much the procedure. The choice 
of a classiher must be based on the weakness and the strengths of each one. For instance, 
for the diagnostic part it is recommended the use of classihers as LDA or GLM because it is 
easier to interpret the rule for separating groups perhaps with some cost on the predictive 
performance. 

• The DD'^-classiher is especially interesting in a high-dimensional or functional framework 
because it changes the dimension of the classihcation problem from large or inhnite to G, 
where G depends only on the number of groups under consideration and the number of 
depths that the statistician decides to employ, perhpaps, times the number of sources of 
information to be used. For instance, if we have 3 groups in the data and the method is 
using 2 different depths, the multivariate dimension of the DD'^-classiher is 6. Clearly, this 
is a more tractable dimension for the problem, but there are, in addition, some ways to 
reduce this number. In this paper, a review of functional data depths is made by including 
modihcations to summarize multivariate functional data (the data are made up of vectorial 
functions) without increasing or even reducing the dimension of the problem at hand. 

In a multivariate setting, this might not be so advantageous because the dimension G is a 
multiple of the number of groups and it could sometimes be greater than the dimension of 
the original space. For instance, in the classical example of Fisher Iris data, there are four 
variables and three groups, so that using the DD'^-classiher map in its most simple case can 
be worked in dimension three. But we can also consider a univariate depth for each variable 
and then the dimension G grows up to twelve. 

• The execution time for each method, measured in CPU seconds, depends on the complexity of 
the combination depth/classiher. Taking the Model 1 with the original curves as a reference, 
the fastest time is obtained by the combination FM-LDA (0.05s). Similar times are obtained 
by QDA and CLM methods. The use of the CAM classiher adds 0.07s. The nonparametric 
classihers (NP, /cNN) typically add 0.35-0.40s to the time due to the computation of the 
distance matrix among points in the DD-plot. The use of random projections increases 
the time 0.01s per combination and the computation of the hM depth takes 1.05s which is 
the time employed by the F/cNN. The use of a combined depth option {.w, .p, .m) doubles 
the execution time. The DD/c choices obtain 0.07, 13.77 and 39.77s, respectively, with the 
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default choice (M = 10000, m = 1) even though better execution times can be achieved 
using M = 500, m = 50 maintaining similar misclassihcation rates. 


The functions needed to perform this procedure are freely available at CRAN in the f da.use 


package (Febrero-Bande and Oviedo de la Fuente (2012)) in versions higher than 1.2.2. 
classif . DD is the principal function and contains all the options shown in this paper related 
to depths and classihers. Most hgures we present are regular outputs of this function. 


SUPPLEMENTAL MATERIALS 


Supplemental Code: Rar compressed hie containing the code with the plots and results in the 
paper (paper.code.R), the code of the simulation studies (simul.xxx.R) and the code for 
applications to real datasets (classif.xxxx.R). A folder with the data is also included (rar 
hie). 

SUPPLEMENTAL MATERIALS 

Supplemental Code: Rar compressed hie containing the code with the plots and results in the 
paper (paper.code.R), the code of the simulation studies (simul.xxx.R) and the code for 
applications to real datasets (classif.xxxx.R). A folder with the data is also included (rar 
hie). 
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